3.4 Pandas数值计算

本文源码请见我的GitHub

1
2
import pandas as pd
import numpy as np

3.4.1 通用函数:保留索引

1
2
3
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0,10,4))
ser
0    6
1    3
2    7
3    4
dtype: int32
1
2
df =pd.DataFrame(rng.randint(0,10,(3,4)), 
columns = ['A', "B", "C", "D"])
1
df
2000 2001
California 165416 213884
New York 665468 598949
A B C D
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4

如果两个对象使用Numpy通用函数, 生成的结果是另一个保留索引的Pandas对象

1
np.exp(ser)
0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64
1
np.sin(df * np.pi / 4)
A B C D
0 -1.000000 7.071068e-01 1.000000 -1.000000e+00
1 -0.707107 1.224647e-16 0.707107 -7.071068e-01
2 -0.707107 1.000000e+00 -0.707107 1.224647e-16

通用函数:索引对齐

这个主要是用在处理二元计算时对齐数据的。

1.Series索引对齐

1
2
3
4
5
area = pd.Series({'Alska':1723, 'Texas': 6871, 'California': 4235}, name = 'area')
population = pd.Series({'California':1456434,"Texas":654687,'New York':4565732}, name = 'population')
print(area)
print()
print(population)
Alska         1723
Texas         6871
California    4235
Name: area, dtype: int64

California    1456434
Texas          654687
New York      4565732
Name: population, dtype: int64
1
population / area
Alska                NaN
California    343.904132
New York             NaN
Texas          95.282637
dtype: float64

结果是两个输入数组索引的并集, 缺失位置用NaN填充;NaN值还不是想要的结果可以设置参数自定义A或B的缺省值

1
2
3
4
A = pd.Series([2,4,6], index= [0,1,3])
B = pd.Series([1,5,8], index= [0,1,2])
print(A)
print(B)
0    2
1    4
3    6
dtype: int64
0    1
1    5
2    8
dtype: int64
1
A + B
0    3.0
1    9.0
2    NaN
3    NaN
dtype: float64
1
2
#这里就可以自定应缺省填充规则
A.add(B, fill_value=0)
0    3.0
1    9.0
2    8.0
3    6.0
dtype: float64

2.DataFrame对齐

1
2
3
C =pd.DataFrame(rng.randint(0, 20, (2,2)), 
columns = list("CD"))
C
C D
0 9 15
1 14 14
1
2
3
D =pd.DataFrame(rng.randint(0, 10, (3,3)), 
columns = list("DEC"))
D
D E C
0 2 6 3
1 8 2 4
2 2 6 4
1
C + D
C D E
0 12.0 17.0 NaN
1 18.0 22.0 NaN
2 NaN NaN NaN
1
#同样也可以使用fill_value参数自定义缺省值
1
2
fill = C.stack().mean()
C.add(D, fill_value=fill)
C D E
0 12.0 17.0 19.0
1 18.0 22.0 15.0
2 17.0 15.0 19.0

3.4.3 通用函数:DataFrame与Series计算

1
A = rng.randint(10 ,size = (3, 4))
1
A
array([[8, 6, 1, 3],
       [8, 1, 9, 8],
       [9, 4, 1, 3]])
1
A-A[0]
array([[ 0,  0,  0,  0],
       [ 0, -5,  8,  5],
       [ 1, -2,  0,  0]])
1
2
df = pd.DataFrame(A, columns=list('QWER'))
df - df.iloc[0]
Q W E R
0 0 0 0 0
1 0 -5 8 5
2 1 -2 0 0
1
df['R']
0    3
1    8
2    3
Name: R, dtype: int32